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Abstract. The ICD-9 terminology standardization task aims to standardize the 
colloquial terminology recorded by doctors in medical records into the standard 
terminology defined in the ninth version of International Classification of Dis- 
eases (ICD-9). In this paper, we first propose a BERT and Text Similarity Based 
Method (BTSBM) that combines BERT classification model with text similarity 
calculation algorithm: 1) use the N-gram algorithm to generate a Candidate 
Standard Terminology Set (CSTS) for each colloquial terminology, which is used 
as the training dataset and test dataset for next step; 2) use the BERT classifica- 
tion model to classify the correct standard terminology. In this BTSBM method, 
if a larger-scale CSTS is taken as the test dataset, the training dataset also needs 
to maintain larger-scale. However, there is only one positive sample in each 
CSTS. Hence, expanding the scale will cause a serious imbalance in the ratio of 
positive and negative samples, which will significantly degrade system perfor- 
mance. While if we keep the test dataset relatively small, the CSTS Accuracy 
(CSTSA) will degrade significantly, which results a very low system perfor- 
mance ceiling. In order to address above problems, we then propose an optimized 
terminology standardization method, called as Advanced BERT and Text Simi- 
larity Based Method (ABTSBM), which 1) uses a large-scale initial CSTS to 
maintain a high CSTSA to ensure a high system performance ceiling, 2) denoises 
CSTS based on body structure to alleviate the imbalance of positive and negative 
samples without reducing the CSTSA, and 3) introduces the focal loss function 
to further promote a balance of positive and negative samples. Experiments show 
that, the precision of the ABTSBM method is up to 83.5%, which is 0.6% higher 
than BTSBM, while the computation cost of ABTSBM is 26.7% lower than 
BTSBM. 
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1 Introduction 


The International Classification of Diseases (ICD) is an internationally unified disease 
classification method formulated by the WHO. ICD-9 is its ninth edition, which classi- 
fies diseases based on surgical operations. The key components of ICD-9 terminology 
are body structures and surgical names. "MEIEN AYIRA" ("Lumbar Discectomy") 
and "eZ ZA ke A" ("Biopsy of Joint") are two typical instances. 


At present, medical terminologies recorded by doctors in medical records often con- 
tain information such as abbreviations and colloquialisms. Doctors may also record 
medical terminologies using excessive fine-grained or coarse-grained descriptions. 
Simultaneously, when using medical terminologies, hospitals and institutions may use 
their self-defined standard terminologies. As a result, it sets a heavy barrier for aca- 
demic communication in medical research field. So it is a real need for hospitals and 
doctors to map these medical terminologies to unified standard ones. And for medical 
insurance, a unified name for the same disease recorded in different descriptions bene- 
fits to quantify insurance compensation. 

In hospitals, manually terminology standardization needs professional knowledge. 
Due to the huge quantity of ICD-9 standard terminologies, and numerous colloquial 
terminologies produced by the hospital every day, it is a time-consuming and laborious 
job. Therefore, the automatic standardization of medical terminology has become an 
urgent need for hospitals and doctors. 

The aim of this paper is to find the corresponding standard ICD-9 terminology for 
the given original medical terminology (namely the colloquial terminology). We first 
propose a BERT and Text Similarity Based Method (BTSBM) that combines the BERT 
classification model with the text similarity calculation algorithm: 1) Use the N-gram 
algorithm to filter out ICD-9 standard terminologies that are highly similar to the orig- 
inal terminology. Standard terminologies with top N of the highest similarities (Top-N) 
are taken to form the Candidate Standard Terminology Sets (CSTSs) for original ter- 
minologies, which are taken as training and test dataset for next step; 2) Use the BERT 
classification model to predict the standard terminology of the original terminology. 
The reason for not predicting the correct standard terminology with all ICD-9 standard 
terminologies are that there are too many irrelevant items and the computation cost is 
too large. Fig.1 shows the framework of BTSBM. Through the BERT classification 
model, we obtain the BERT predicted negative or positive (i.e. 0 or 1) label of each 
candidate terminology, and choose the one with the highest probability among all can- 
didate terminologies as the standard terminology. 

However, in this BTSBM method, there is an imbalance in the proportion of positive 
and negative samples. The scale of CSTS is positively correlated with the CSTS Accu- 
racy (CSTSA, which is defined in formula 5), and CSTSA determines the upper limit 
of system performance. If a larger-scale CSTS is used in the test dataset, in order to 
have better system performance and robust generalization ability, the training dataset 
also needs to use a large-scale CSTS. For the standard terminology corresponding to 
each original terminology is unique, the larger N, the larger negative samples contained 
in CSTSs, which will cause a serious imbalance in the proportion of positive and neg- 
ative samples. As a consequence, system performance significantly degrades. While if 
a small-scale CSTS is used as the test dataset, the low CTSTA sets a low system ceiling 
performance, which is insupportable both in academic research and real-world applica- 
tion. 

In order to address above problems, we propose an optimized terminology standard- 
ization method, called as Advanced BERT and Text Similarity Based Method 
(ABTSBM): 1) Use large-scale initial CTSTs to maintain a high CSTSA to ensure high 
system performance ceiling; 2) Use body structure based data denoising technique to 


reduce the imbalance and further reduce the computation cost without affecting 
CSTSA; 3) Use the focal loss function to further solve the imbalance problem in the 
training dataset, and improve system performance. In result, we efficient alleviate the 
serious imbalance between positive and negative samples caused by large-scale CSTS. 
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Fig. 1. The framework of BTSBM. Based on the text similarity calculation, the CSTS corre- 
sponding to the original terminology is generated. Then, original terminology and its candidate 
terminology pairs are input into the BERT classification model, and the candidate terminology 
with the highest probability is output. 


2 Related work 


Regarding the standardization of ICD Terminology, Liu [1] once developed a complete 
entry system on ICD 10. By standardizing the filling content of doctors, enter standard 
terminologies directly. Cheng [2] improved a dictionary of the work presented in [1]. 
However, these methods not only rely on the input of doctors, but also cannot solve the 
problem of old medical records standardization. The large amount of information con- 
tained in old medical records is exactly what doctors cannot ignore when doing re- 
search. 

We believe that the ICD-9 terminology standardization task can be formalized as a 
text similarity task based on deep learning. At present, the related work of text similarity 
includes text similarity calculation algorithm based on string and some methods based 
on neural network. There are summaries of the methods of text similarity calculation 
based on string [3-5], such as N-gram [6], Longest Common Subsequence [7] and Edit 
Distance [8]. Yu et al. [9] used Jaccard Distance to calculate the similarity between two 
texts. Sidorov et al. [10] proposed an algorithm for tree Edit Distance. The methods 
based on neural network mainly calculate similarity by generating word vectors. Kenter 
et al. [11] used word vectors of different dimensions to train the classifier to predict the 


similarity score between short texts. Mikolov et al. [12] and Pennington et al. [13] pro- 
posed Word2Vec and GloVe to generate word vectors, respectively. Devlin et al. [14] 
proposed a pre-training model BERT, which predicts the similarity between sentence 
pairs by 0-1 binary classification of sentence pairs. 


3 Method 


3.1 BTSBM 


As shown in Fig.1, BTSBM composes of two parts: the text similarity and the BERT. 
The first part uses a string-based text similarity calculation algorithm to get the simi- 
larity of ICD-9 standard terminologies and the original terminology. Then, take the 
ICD-9 standard terminologies with the Top-N highest similarity as a CSTS for each 
original terminology. The second part uses the BERT classification model to predict 
the similarity between each candidate terminology and the original terminology, and 
output the candidate terminology with the highest predicted probability. 

In the first part, through the construction of CSTS, we can effectively screen out the 
terminology in the ICD-9 standard terminologies that is highly similar to the original 
terminology. It avoids the huge computation cost for the BERT classification model 
caused by the large number of pairs of each original terminology with the ICD-9 stand- 
ard terminologies. And it also filters out some interference items that may affect the 
BERT prediction result. 

Our commonly used text similarity calculation algorithms contain N-gram algo- 
rithm, Longest Common Subsequence algorithm and Edit Distance algorithm. The 
basic idea of the N-gram algorithm is to divide the terminology into sub-sequences 
according to length N, and these sub-sequences are called grams. Then, count the num- 
ber of the same gram in two strings to measure the similarity. 

In BTSBM, we select the N-gram algorithm to calculation the similarity of the orig- 
inal terminology between ICD-9 standard terminologies. Then we can screen the Top- 
N similarity ICD-9 standard terminologies for constructing the CSTS. The formula of 
N-gram algorithm is as shown in Formula 1. 
2*Ngram(Terminology;,Terminology j) 


sim(Terminology;,Terminology;) = 
( IVi gy;) len(Terminology;)+len(Terminology;) 


(1) 


In the second part, the BERT classification model is in binary classification mode, 
which is used to predict whether the original terminology is similar to its candidate 
terminology, as shown in Fig.1. However, because the standard terminology corre- 
sponding to an original terminology is unique, so we do not obtain the label outputted 
by BERT, but to obtain the candidate terminology with the highest probabilities among 
the probability of candidate terminologies. Two examples are shown in Table 1, in the 
case of "K'IJA" ("Allogeneic Kidney Transplantation"), although there are 
multiple candidate terminologies predicted as positive and their probabilities were very 
similar, " H KEA" ("Kidney Allograft Transplantation ") with the highest prob- 
ability is the correct standard terminology. Also, in the case of "Pr Ah [il xe 3 [Al XE AR" 


("Pelvic External Fixation "), even if BERT judges these two terminologies are not 
similar because of the probabilities less than 0.5, the candidate terminology " Fr 4} [i 
fe AR" ("Pelvic External Fixation") with the highest probability is still chosen as the 
correct standard terminology. 


Table 1. Instances of BERT prediction results. 


Original Terminology Candidate Terminology Probability of Positive Sample 
RAS HAAN 0.927 
VP AS ELAN 
“ Beh 0.922 
ve on BT TIT EAR 0.091 
BQOE ROMER ayn pomp 
Hb Vel FE ARE IS BRAK 0.066 


3.2 ABTSBM 


In BTSBM, if a larger-scale CSTS is used for the test dataset, a larger-scale CSTS is 
also required for the training dataset to learn enough features to distinguish interference 
items in test dataset. But there is only one positive sample in CSTS, because each orig- 
inal terminology has only one corresponding standard terminology. And with the ex- 
pansion of CSTS, the number of negative samples gradually increases, which will cause 
a serious imbalance in the proportion of positive and negative samples in the training 
dataset. And it will significantly reduce the system performance. Taking the CSTS con- 
structed by Top30 as an example, the proportion of positive and negative samples is 
1:29. Because the precision of the BERT is not equal to the precision of the terminology 
standardization task, and it is evaluated based on whether the 0-1 label is correct, rather 
than whether the candidate terminology with the highest probability is the correct stand- 
ard terminology. So in the extreme case all classified as label 0, the precision of the 
BERT classification can still reach 97%, and the precision of the terminology standard- 
ization task is 0%. Therefore, due to the sparse positive examples, it is difficult to train. 
However, if the test dataset uses a small-scale CSTS, it will directly affect the CSTSA, 
and the lower CSTSA determines the upper limit of the system performance, which 
obviously cannot achieve high system performance. 

In order to solve the above problems, we propose ABTSBM, an optimized terminol- 
ogy standardization method, as shown in Fig.2. Compared with BTSBM, we use two 
methods to alleviate the imbalance of positive and negative sample ratio caused by 
large-scale dataset: 1) denoise the dataset based on body structure to delete irrelevant 
candidate terminologies; 2) Use the focal loss function to enhance the BERT classifi- 
cation model’s learning ability for unbalanced training dataset. 
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Fig. 2. The framework of ABTSBM, which composes of three parts: text similarity calculation 
based CSTS construction module; body structure based dataset denoising module; BERT clas- 
sification model with focal loss function. 


Denoising methods. In the CSTS obtained by the N-gram algorithm, there are some 
candidate terminologies that are highly similar to the original terminologies but are ob- 
viously incorrect. Some examples are show in Table 2: in the original terminology "3< 
REE T E Tipe 4d GR AR" ("Extraction of Vocal Cord Lesions under Support Laryn- 
goscope"), the supplementary information " +# 4" ("support laryngoscope") does 
not match the information " PY $4" ("endoscope") in its standard terminology. But by the 
N-gram algorithm, "ScPe"R BE FPS FYE AR" ("Vocal Cord Injection under Support 
Laryngoscope") is more similar than the standard terminology; in the case of " y $4 F 
Aa (Il FRR WI BRAK" ("Endoscopic Right Thyroid Lobectomy"), the body structure 
of the candidate terminology is "IRE" ("Adenoids"), which is obviously contradic- 
tory to the body structure "FAK HRA" ("Thyroid Lobe") of the original terminology. 
Based on the above data characteristics, we investigate two denoising methods, which 
are Denoising method based on Supplementary Information (namely DSI) and De- 
noising method based on Body Structure (namely DBS). 


Table 2. Instances of original terminologies and their standard terminologies and candidate ter- 
minologies. 


Original Terminology Standard Terminology Candidate Terminology. 
SCHEME P FE TEIA R RAR ABE F ERIR RAR SCIRE BET A 
AE FAW FARRAR ERA FADER TI BR AR A Be PRE A RAR 


The DSI aims to remove the supplementary information in the original terminology 
and the ICD-9 standard terminologies (such as approach method, endoscope name and 


location word, etc.). As a result, the treated candidate terminology composes of com- 
ponents that can directly determine the standard terminology (Such as body structure 
and surgical name). 

The DBS first uses the BILSTM-CRF model [15] to perform named entity recogni- 
tion, and to extract the body structures of the original terminology and candidate termi- 
nologies. Then, compare the body structures existing in the original terminology and 
its candidate terminologies, if a candidate terminology contains body structure that is 
not contained in the original terminology, then the candidate terminology and the orig- 
inal terminology contradict in aspect of body structure, so we discard the candidate 
terminology. Some examples are given in Table 3, the original terminology 
"B FILZ" ("Gastric Perforation Repair") includes the body structure "§" ("stom- 
ach"), and its candidate terminology "H 4L12¢hA" ("Bowel Perforation Repair") in- 
cludes the body structure "/%"(" bowel "). Because of body structure contradiction, this 
candidate terminology is discarded. And the other candidate terminology " B (2#h7\" 
("Gastric Repair") contains the same body structure as the original terminology, so it is 
retained. We manually annotate ICD-9 standard terminologies with BIO tagging 
scheme to obtain training dataset for BILSTM-CRF. 


Table 3. The Instances of the denoising method based on body structure. 


Original Terminology Body Structure 


Candidate Terminology Body Structure Decision 


FA AR IDSA IER AR ARH retain 
WU) FER AR BR AR IR 
UU ARMA PRA SONS EEIT SLB discard 
es 5 retain 
= px Z yi py ES 
BILE E FILER ih discard 


The DSI avoids some interference items introduced by removing supplementary in- 
formation during the construction of CSTS, but it is not change the size of CSTS. The 
DBS discards candidate terminologies that are completely unrelated to the original ter- 
minology by comparing body structures, effectively alleviating the imbalance of posi- 
tive and negative sample ratios without affecting CSTSA. Take the initial CSTS con- 
structed by Top-30 as an example, the CSTS scale can be reduced from Top-30 to 
AVG-22 (namely the average quantity of the candidate terminologies contained in each 
CSTS is 22), which significantly reduces the proportion in the number of positive and 
negative samples, and reduces the computational cost of the model by 26.7%. 


The BERT classification model with focal loss function. Although the DBS can ef- 
fectively alleviate the problem of the imbalance of positive and negative sample ratios. 
But the imbalance problem still exists. The same as the Top-30 in the above paper as 
an example, the ratio of positive and negative samples of 1:21 after denoising is still 
too large. Therefore, we further use the focal loss function [16] to alleviate this problem. 

The loss function used by the original BERT classification model is the cross entropy 
loss function, as shown in formula 2. The focal loss function is improved on the basis 
of the cross entropy loss, as shown in formula 3. Compared with the cross entropy loss 
function, the focal loss function adds two parameters a and y. The y is used to adjust 


the contribution of difficult samples to the loss function, and a to control the weight of 
positive and negative samples in the loss function. In the ABTSBM, we think the im- 
portance of positive and negative samples is the same, but we need to control the impact 
of difficult samples on the loss function, so we take the values as follows: a = 0.5, y = 
2. 


“egyy =l 2) 


—a(1—y’)"logy',y =1 


= ia -a)y log —y'),y = 0 a 


After using focal loss function, the model's ability for candidate terminologies that 
are difficult to distinguish will become stronger. As shown in Table 4, in the model 
using cross entropy loss function, when two candidate terminologies are too similar, 
the probability difference of the model output may be extremely small, which can be 
considered that the model cannot deal with the difficult distinction situation effectively. 
For example, even in the case of "4 EEr 441 AG A MRE N ET A El xe", the candidate 
terminology with the highest probability is the correct standard terminology, in the case 
of "CEM FER RT (I) QO RAR", the second probability is the correct standard 
terminology. But in terms of probability, the candidate terminologies in these two cases 
are very similar, with almost no difference. After using the focal loss function, the prob- 
ability difference between the two terminologies is enlarged, and then the model truly 
has the ability to deal with this difficult distinction situation. 


Table 4. Prediction probability comparison between cross entropy loss model and focal loss 


model. 
Original Candidate Probability of Positive Sample 
Terminology Terminology Cross Entropy Focal Loss 
Model Model 
ABS SHAE BEEE iA EER 0.9989103 0.9201928 
ALBE IN ET A EE BE eZ i EER 0.99729234 0.14210172 
AeA) FR RC AFC BR MT EP BRAR  0.9998492 0.05657335 
FMR EM EREHE 0.9978193 0.93587375 


4 Experiment and analysis 


4.1 Experiments data 


The experimental data comes from the ICD-9 terminology standardization academic 
competition organized by CHIP2019. 

The CHIP2019 academic competition provides 9866 ICD-9 standard terminologies 
and 5492 terminology pairs (each terminology pair composes of the original terminol- 
ogy and its ICD-9 standard terminology), of which 3642 are taken as training dataset 


and the remaining are taken as test dataset. In the BTSBM and ABTSBM, if the correct 
standard terminology is not contained in corresponding CSTSs that are taken as training 
dataset, we will manually add the correct standard terminology to CSTSs. 


4.2 Evaluation metrics 


We use the precision defined by CHIP2019 as the final evaluation metrics, as shown in 
formula 4. In this task, because of the task characteristics, the final evaluation metric 
only consider precision. However, CSTSA is indicative for the construction of CSTS. 
So in the CSTS construction process, CSTSA should still be considered, the formula as 
shown in formula 5. 


oie The number of correct pairs of original and standard terminology 
precision = —————— MM (4) 
Total number of original terminologies 


Total number of CSTS containing the correct standard terminology 


CTSTA = (5) 


Total number of original terminologies 


4.3 Experimental results and analysis 


CSTSA. We compare the CSTSA of the BTSBM after two denoising methods with the 
original BTSBM, as shown in Table 5. The Top-N column represents CSTS scale. The 
BTSBM column represents the CSTSA of the BTSBM. The BTSBM-DSI column rep- 
resents CSTSA of the BTSBM after the DSI. The AVG-N (for BTSBM-DBS) column 
represents the average CSTS scale of the Top-N by the BTSBM after the DBS and the 
content in brackets is its original scale. The BTSBM-DBS column represents the 
CSTSA of the BTSBM-DBS. Especially, because the ABTSBM also uses DBS, the 
CSTSA of ABTSBM is the same as BTSBM-DBS. 


Table 5. The CSTSA of BTSBM-DSI and BTSBM-DBS with BTSBM. 


DBS) 
Top-15 84.8 83.9 AVG-12 (Top-15) 84.8 
Top-20 86.1 86.2 AVG-15 (Top-20) 86.1 
Top-30 89.5 89.1 AVG-22 (Top-30) 89.5 
Top-40 90.4 90.4 AVG-30 (Top-40) 90.4 
Top-50 91.4 91.3 AVG-35 (Top-50) 91.3 


It can be seen from Table 5 that BTSBM-DSI hardly affects CSTSA when the N 
value is large. And BTSBM-DBS reduced the original datasets of Top-15, Top-20, and 
Top-30 to average sizes AVG-12, AVG-15, and AVG-22, which significantly reduced 
the proportion of positive and negative samples while maintaining the original CSTSA, 
and also hardly affects CSTSA. 
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Experimental results of BTSBM. There is the precision of BTSBM, which are shown 
in Table 6. The dataset constructed for the BERT classification model through N-gram 
algorithm, includes the training dataset with Top-N; scale CSTS and the test dataset 
with Top-N2 scale CSTS: 


Table 6. The precision of BTSBM. 


BTSBM 
Test Dataset 
Training Dataset 
Top-20 Top-30 
Top-15 79.2 81.3 
Top-20 79.9 82.6 
Top-30 80.0 82.9 


Experimental results of ABTSBM. We compare the precision of BTSBM after DSI 
(namely BTSBM-DSJ) and BTSBM after DBS (namely BTSBM-DBS) with original 
BTSBM, as shown in Table 7. And we also compare the precision of BTSBM with 
Focal Loss function (namely BTSBM-FL) with original BTSBM, as shown in Table 8. 
Finally, we compare the precision of ABTSBM with BTSBM, as shown in Table 9. 


Table 7. The precision of BTSBM-DSI and BTSBM-DBS compared with BTSBM. 


BTSBM BTSBM-DSI BTSBM-DBS 
Training Test Dataset Test Dataset a Test Dataset 
Dataset Areng DANS avgis- AVOM 
Top-20 Top-30 Top-20 Top-30 After DBS (Top-20) (Top-30) 
Top-15 79.2 81.3 78.5 79.3 AVG-12(Top-15) 79.8 81.5 
Top-20 79.9 82.6 78.8 80.3 AVG-15(Top-20) 80.4 82.8 
Top-30 80.0 82.9 78.6 80.6 AVG-22(Top-30) 80.4 83.1 


From the experimental comparison results in Table 7, it can be seen that the BTSBM- 
DSI does not achieve a better precision than BTSBM, though DSI removes the supple- 
mentary information to avoid candidate terminologies in CSTS that are highly similar 
to the original terminologies but not irrelevant. With the training dataset size set in this 
paper, the BTSBM can already learn features related to supplementary information to 
determine whether it is an irrelevant interference item. However, the DBS can reduce 
the training dataset with scale Top-30 to dataset with scale AVG-22. The dataset size 
is 26.7% of the BTSBM, which reduces the proportion of positive and negative sam- 
ples. Moreover, DBS does not cause CSTSA loss. As shown in Table 7, when the train- 
ing dataset and the test dataset are both Top-30, the precision of BTSBM-DBS is still 
improved by 0.2% compared with BTSBM, which effectively illustrates the effective- 
ness of the DBS method. 
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Table 8. The precision of the BTSBM-FL compared with BTSBM. 


Training Dataset BTSBM BTSBM-FL 
Test Dataset Test Dataset 
Top-20 Top-30 Top-20 Top-30 
Top-15 79.2 81.3 80.1 82.4 
Top-20 79.9 82.6 80.5 83.2 
Top-30 80.0 82.9 80.5 83.5 


From the experimental comparison results in Table 8, it can be seen that through the 
BTSBM-EL, for exactly the same dataset without denoising, when the training dataset 
and the test dataset are both Top-30, the precision is improved by 0.6%, which fully 
illustrates the imbalance in the ratio of positive and negative samples does have an 
impact on model training, and the focal loss function can better learn and predict the 
unbalanced data in this task. 


Table 9. The precision of the ABTSBM compared with BTSBM. 


Training BTSBM ABTSBM 
Dataset Test Dataset Training Dataset Test Dataset 

Top-20 Top-30 After DBS AVG-15 (Top-20) AVG-22 (Top-30) 
Top-10 79.0 80.5 AVG-12(Top-15) 79.4 81.7 
Top-15 79.2 81.3 AVG-15(Top-20) 80.7 83.2 
Top-20 79.9 82.6 AVG-22(Top-30) 80.5 83.5 


Top-30 80.0 82.9 - - - 


Through the experimental comparison results in Table 9 and the horizontal compar- 
ison data, it can be seen that when the ratio of positive and negative samples between 
the ABTSBM and the BTSBM are almost the same, the precision of the ABTSBM is 
higher than the BTSBM. When the training dataset and test dataset of the ABTSBM 
and the BTSBM are AVG-22 (Top-30) after DBS and Top-20 without DBS respec- 
tively, the ABTSBM achieves a 0.9% higher precision than the BTSBM. Comparing 
the data diagonally, it can be seen that the ABTSBM not only reduces the dataset size 
of the original Top-30 to the dataset size of AVG-22, reducing the scale by 26.7%, 
which significantly reduces the computational cost and reduces proportion of positive 
and negative samples, but also achieved 0.6% precision higher than the BTSBM on the 
Top-30 test dataset. 


5 Conclusion 


In summary, we first propose the BTSBM that combines the BERT and text similarity. 
Then we subsequently propose an optimized terminology standardization method: 
ABTSBM, which 1) uses a large-scale initial CSTS to maintain a high CSTSA to ensure 
a high system performance ceiling, 2) uses the DBS to reduce the size of the CSTS 
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without affecting CSTSA, which not only reduces the computational cost, but also re- 
duces the imbalance of the positive and negative sample ratio of the dataset, 3) uses the 
BERT classification model with focal loss function to improve the model's ability to 
train unbalanced data by the focal loss function. Through the ABTSBM, the precision 
is up to 83.5%, which is 0.6% higher than BTSBM, while reducing the calculation cost 
by 26.7%. 
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